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Abstract 

We devise a method for designing materials that will have some desired structural char- 
acteristics. We apply it to multiblock copolymers that have two different types of monomers, 
A and B. We show how to determine what sequence of A's and B's should be synthesised in 
order to give a particular structure and morphology. Using this method in conjunction with 
the theory of microphase separation developed by Leibler, we show it is possible to efficiently 
search for a desired morphology. The method is quite general and can be extended to design 
isolated heteropolymers, such as proteins, with desired structural characteristics. We show 
that by making certain approximations to the exact algorithm, a method recently proposed 
by Shakhnovich and Gutin is obtained. The problems with this method are discussed and we 
propose an improved approximate algorithm that is computationally efficient. 
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The problem addressed in this letter is the following. Is there an efficient method for 
designing a material with a particular morphology or structure? We develop a systematic 
approach to this problem that we illustrate for the design of copolymeric materials. 

Structures of single chains in solution have also been extensively studied, often in relation 
to the important biological question of how to determine the structure of a protein from 
its sequence fl], §]. Work on the design of a chemical sequence which has a desired three 
dimensional structure has also been recently considered ||. For the two dimensional model 
of Dill et al. Q it has been possible to devise a set of rules that achieve a desired tertiary 
structure In three dimensions, much less is known. Ad hoc methods have been attempted^, 
|6| but recent tests have shown that they are not entirely efficacious^. 

To illustrate our general method, we will apply it to design of block copolymers, which are 
polymers made out of more than one chemical species. We will consider copolymeric systems 
made up of two constituent types of monomers denoted A and B. The phase diagram of such 
materials has been studied as a function of the Flory interaction parameter x an d the lengths 
of the segments of A and B. Lamellar, hexagonally closed packed, body centred cubic ||, and 
gyroidQ phases have been predicted. Experimentally copolymers have been found to exhibit 
a variety of different structures, sometimes referred to as microphases. 

Suppose we would like to design a new phase that has a given symmetry. Until now, it was 
necessary to do an inefficient search through the phase diagram of the system in order to find 
the desired symmetry. We show below that we can find a function that when minimised can 
home in on the correct structure. We then implement this practically in the framework of the 
theory of copolymers developed by Leibler||]. We then turn to the problem of protein design 
and discuss how the method we developed can be used in this context. 

First we wish to determine the correct function to minimise in order to obtain the best 
sequence corresponding to the desired morphology. We begin with a formulation of the general 
problem that we wish to solve. Consider a system with coordinates denoted by T and with 
a chemical sequence denoted by S. We can define a function which tells us whether each 
structure T is in the desired set of structures. Call this P s truct(T)- It is a constant if T belongs 
to the class of desired structures and otherwise. In practice we will express P s truct{^) in 
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terms of the clamping potential defined below. Consider next the probability that a sequence 
S gives a desired structure. Since this is a conditional probability we denote it by P(S\struct). 
We wish to find the maximum of P(S\struct) over all sequences S keeping the structure fixed. 
To uniquely define P(S\struct), we have to choose the a priori probability of choosing an 
arbitrary sequence S. The simplest choice is that it is uniform, that is P(S) is constant. We 
consider the system in equilibrium at a finite temperature so the probability that a sequence 
S has structure T is given by the Boltzmann factor P(r|S') = exp(— [Hs(T) — F (S)]/T), 
where Hs(T) is the Hamiltonian of the system with a particular sequence S, and F (S) is the 
corresponding free energy of the system with S kept constant. Bayes' theorem states that 
the joint probability P(T, S) = P(T\S)P(S) = P(S\T)P(T), hence after some algebra, we 
obtain 

P(S\struct) = Y, P struct{T)P{S\T) = £ P' struct (T) exp(-(# s (r) - F (S))/T) (1) 
r r 

where 

pi /p\ _ Pstruct^X) /r>\ 

strucA ) - EseM _ {Hs{T) _ Fo (S))/TY [ > 

Because P' s truct does not depend on the sequence S, we can equally well regard it rather than 
Pstruct as given. It can be thought of as imposing an external clamping potential on the 
system, pushing it into the correct structure, P' s truct^X) = ex P( — V ex t(F)/T) . We will make 
an appropriate choice for the clamping potential, V ex t(T) for each problem, based on physical 
reasoning rather than using (|) directly. Thus @) can be further reduced to 

P{S\struct) cx eM-(Fstmct(S) - F {S))/T) = exp(-AF/T) (3) 

where F s t ruc t{S) is the free energy pushed into a certain structure by the clamping potential 

eM-Fstmct(S)/T) = £ exp((-if 5 (r) + V ext (T))/T) (4) 

r 

The physical interpretation of this is clear. The optimum sequence is the one that minimises 
the difference AF between the unrestricted free energy and the free energy clamped in the 
desired structure. Intuitively this is reasonable because it picks out a sequence that naturally 
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wants to spend a lot of time in this structure. This result is a generalisation of that used in 
determining couplings in Boltzmann machines (TO]. 

As a first example, we apply the formalism above to design copolymeric systems with 
desired lattice structures. Leibler|| has developed a theory of microphase separation for di- 
block copolymers. Here we have further generalised this to m blocks each block being made 
up entirely of A monomers or of B monomers, the ith having a length li, i = 1 . . . m . The 
fractional length of the ith block is fi = 1%/L where L is the total chain length. The A and 
B monomers have an incompatibility, or Flory, parameter x giving the degree to which the 
A and B monomers wish to segregate. Leibler took as the order parameter Ap, the ensemble 
average of the difference between the density of ^4's and the average density of A's. He was 
able to construct an expansion of the free energy in terms of Ap and calculated explicitly the 
expansion to fourth order in terms of the underlying chemical structure of the chains, that is, 
h, I2, and x- This theory should work well near the spinodal point for this system, because 
Ap is small there. 

To determine the stability of density variations with different crystallographic symmetries, 
Leibler took Ap to be a periodic function of position r, 

n 

M r ) = J2 ^(flj) expfab' ' r )' ( 5 ) 
i=i 

choosing the q/s, j = 1, . . . , n to be the smallest non-zero reciprocal lattice vectors of the lattice 
structure being considered. The magnitude, q*, of the q^-'s is taken to be the wavevector at 
the spinodal point where divergent fluctuations first appear. He further took the magnitude, 
but not the phase, of all the ^(q^'s to be equal. By choosing different qj's and minimising the 
resultant free energies he computed which crystal structure had the lowest free energy. He was 
able to obtain a phase diagram for the system as a function of fx and xiV. Besides the high 
temperature disordered phase he found that the body centred cubic, triangular (hex.), and 



lamellar (lam.) phases existed in different regions of the phase diagram. Further work[ll, 12| 
extended this treatment to triblocks. 
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It is convenient to take V ex t(T) to be smoothly varying: 



n 



V ext {r) 



= —v 



^exp(iqj • r). 



(6) 



The magnitude of v adjusts the degree to which Ap fluctuates. A clamping potential with, for 
example, hexagonal symmetry is the sum of three plane waves. Applying such a potential to 
the undamped free energy Fq, we obtain a clamped free energy 



which will tend to push the system into a phase with the symmetry of the external potential. 

Unfortunately in the general formalism developed above it is necessary that the clamping 
potential V ex t is very large when monomers stray from the desired structure. This requirement 
leads to a large Ap and hence is incompatible with the limits of validity of the free energy 
expansion of Leibler which is only valid in the limit of weak segregation. Therefore we need 
to consider complications that arise when only a weak clamping potential is applied. 

For small V ex t, the system will not always be pushed into the symmetry of the external 
potential. In fact, for the values of v that we use, the effect of adding V ex t is only to slightly 
enlarge the region of the phase diagram that has the symmetry of V ex t- Our algorithm deter- 
mines which phase the system is in by allowing the magnitudes of the ^(qj)'s to be unequal 
and then minimising with respect to them. This allows the possibility of mixed phases with 
more than one type of symmetry. For example if we consider the hexagonal phase and write 
@ in terms of its wave vectors qi , q2 , and q3 , 



then for Viam = *Phex we have a hexagonal structure (n = 3 for the hex. phase), and for 
Tphex = we have a lamellar structure (n = 1 for the lam. phase). 
For small V ex t, 




(7) 



Ap(r) = ipi am exp(?qi • r) + ip hex (ex.p(iq 2 ■ r) + exp(iq 3 • r)) 



(8) 



AF = F(V ext ) - F 



v(dAF/dv) 



= —v 



(9) 
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where n c is the number of qj)'s that V ex t has in common with Ap(r). From this it can be seen 
that AF does not do quite what one would like. In general a clamping potential will have 
reciprocal lattice vectors in common with more than one type of symmetry in Ap(r). This 
means that AF will be lowered for other phases in addition to the one with the symmetry 
of V ex t- For example, for a clamping potential with hexagonal symmetry Vh ex , AF(Vhex) = 
F(Vhex) ~ Fq wm be lowered for the lamellar phase as well as the hexagonal. We would like 
a functional which is more selective in order to design a hexagonal material. The solution in 
this case is to subtract the lamellar component as follows. Looking at (|8]), we see that the 
appropriate functional to maximise is iphex- We would like to express iphex in terms of free 
energy differences. From (Q) iphex = —{dAF{V} iex )/dv) — dAF(Vi am )/dv)/2. For small v this is 
proportional to FiV^) — F(V[ am ). Therefore minimising the difference in free energy between 
hexagonal and lamellar clamping will bring the system into the hexagonal phase. Fig. |l|(a) 
shows (F(Vhex) — FiYiam)) as a function of the fraction / for diblock copolymers at x = 20. 
The minimum / = 0.32 occurs in the correct position indicated by Leibler's theory. 

We also tested out this method for designing diblock bcc structures. Fig. |l|(b) shows 
F{Vb cc ) — F(Vh ex ) as a function of / at x = 20. The minimum / = .23 also occurs in the 
correct position but there is also a secondary small minimum at / = .32. This shows that 
there is no guarantee that there will only be one minimum for this minimisation function. 

Now we turn to the problem of protein design. An interesting approach has recently 
been proposed by Shakhnovich and Gutin|5], [|(SG), who proposed a method for solving this 
problem for a simplified lattice model using a self avoiding chain. The model they employed 
has sequences {cij} of two possible monomer types that are given values ±1, for chains of 
length N. This, plus the positions of all the monomers {r^}, completely describe the state of 
the chain. We wish to find a sequence that causes the chain to fold up into a desired structure, 
but we do not care about what the monomer types end up being. The energy is 

1 N 

E(Wi}, {n}) = - £(B + Ba^Ain - r,-) (10) 

i,i 

Here we will consider the case where A(rj — r^) = 1 if and are nearest neighbours, and 
is zero otherwise. One sets B < 0, since this favours ferromagnetic ordering of the cr's, which 
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means that the monomers will want to segregate, and Bq < since this provides an attractive 
interaction between monomers causing the protein to collapse. 

Their method of sequence determination was to do a constrained minimisation of the 
energy of the chain in sequence space. The constraint was that the total magnetisation was 
held constant, in practice, close to zero. Note that without this constraint, all the a's would 
become equal, and one would have a homopolymer which does not have a well defined structure. 
Unfortunately it appears that even with constrained minimisation, sequences found do not 
necessarily have to have the desired structures as the lowest energy states J*]]. Even if the 
desired structure is a ground state, there may be a large ground state degeneracy in which 
case the structure is ill defined, as in the case of the unconstrained minimisation just mentioned. 

The correct functional to minimise is AF defined in (||). If we specialise to the problem 
considered here, the constraining potential is a delta function since our structure is to be pre- 
cisely determined. Calling the coordinates of the desired structure {rf}, the correct functional 
to minimise is, according to (|3|), 

AF = E({<7i}, {rf }) — F ({<7j}). (11) 

SG's method is a minimisation of only the first term, with a constraint of constant total 
magnetisation. We will now argue that second term is not negligible and cannot be omitted. 
In fact, we will see that a crude approximation gives an answer similar to the constraint of 
constant magnetisation. But this approximation is of dubious validity which is why it failed. 

Take Bq to be very large and negative so that all stable structures must be globular with 
minimal surface area. The coordinates of our structure {r^} are therefore constrained to be 
of this compact type and the term in the energy involving Bq will not vary and can now be 
ignored in the minimisation. The space of all conformations {r{\ we need to consider are also 
compact conformations of the same overall shape, but different internal arrangements. We now 
expand out F({<Ji}) keeping only the lowest order cumulant. F({<7j}) f» (E) + constant. The 
angled brackets denote an average that is equally weighted over all compact conformations with 
minimal surface area. Ignoring constant terms, this gives F Q ({ai}) « a i a j(^-i v i ~ r j)) 
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so 



AF«-X;[A(r i -r,-)-(A(r i -r i )>] 



(12) 



One can easily show that the nearest neighbour interactions along the backbone of the chain 
cancel, because the probability that monomers i and i + 1 are next to each other is unity This 
must happen because this kind of interaction is the same for all configurations and consequently 
cannot play a role in choosing the optimum ex's. We now find an approximate functional form 
for (A(i-j — tj)). Since the chain is compact there is a short screening length. We therefore 
expect random walk correlations when \i — j\ is greater than a few lattice spacings. However 
when \i — j\ 1 / 2 becomes of order the diameter of the protein, the conformations cease to look 
like random walks as the protein is compact. This crossover corresponds to — ~ N 2 / 3 . For 
scales larger than this the correlation function should be almost constant. Therefore 



Therefore in this approximation, F({ai}) looks like a one dimensional Ising model with the 
above long range interaction. 

If we ignore the variation of (A(rj — rj)} with \i — j\, i.e. (A(rj — r^)) = 1/N so AFmf = 
E({(Ji}, {rf }) — a i) 2 1 then F Q gives an infinite range mean field contribution that 

is antiferromagnetic, and hence acts as a "soft" constraint favouring a total magnetisation 
of zero. To have a total magnetisation of zero, one must introduce a domain wall, which 
increases E({<Ji}, {rf}) by of order iV 2 / 3 , but this is more than compensated by the gain 
in free energy of the second term which is of order N. Hence in this limit we recover the 
approximation of SG. However this is a rather drastic approximation. Even within the first 
order expansion derived above, one is not justified in neglecting the shorter distance variation 
of (A(i"i — r^)) because, within this mean field approximation, there is a spurious degeneracy 
in the cr's that minimise AFmf- This is because the three dimensional arrangement of the 
cTj's namely cr(r) is independent of the desired conformation {rf }. That is, to find the correct 
sequence, it is not necessary to consider the different internal arrangements of the chain inside 
the compact cluster, as they all give an identical AFmf- After the minimisation of AFmf has 




(13) 
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been performed once, the three dimensional arrangement of the cr's do not change when the 
desired conformation is changed. 

However a non-constant (A(rj — r,-)} breaks this degeneracy. It adds antiferromagnetic 
couplings along the backbone of the chain. This means that the domain wall should tend 
to orient itself roughly perpendicular to the direction of the backbone of the chain to satisfy 
the antiferromagnetic couplings. The contribution to AF due to the \i — j|~ 3//2 decay of 
(A(rj — r-j)) is substantial and cannot be neglected. We can easily estimate it for the case of 
a desired structure {r®} that is in a typical random configuration. For a piece of arclength 
s = — = N 2 / 3 the contribution to AF is of order Bs 2 /s 3 ^ 2 = BL 1 / 3 . But there are L/L 2 / 3 
such pieces, giving a total contribution of order BL 2 ! 3 . This precisely the same order as the 
energy of the domain wall and therefore must be considered in doing protein design. 

In conclusion, we have developed a method to design molecules that will self assemble 
into a desired structure. We used this to design block copolymers with desired structural 
characteristics, within the framework of Leibler's mean field theory. 

We also note that the method for design described above should also work for the problem 
of protein design, for which no trustworthy methods have been devised so far. It would be 
interesting to see how well the approximate minimisation function, given by (|l2|), designs stable 
proteins. 
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Figure 1: (a) F{Vh ex ) — F(Vi am ) and (b)-F(Vf, cc ) — F(Vh ex ), as a function of the fraction / for diblock 
copolymer design. The minima give the best choice for the design of hexagonal material and bcc 
material, respectively. 
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